Distributional transformations, orthogonal 
polynomials, and Stein characterizations 
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Abstract 

A new class of distributional transformations is introduced, char- 
acterized by equations relating function weighted expectations of test 
functions on a given distribution to expectations of the transformed 
distribution on the test function's higher order derivatives. The class 
includes the size and zero bias transformations, and when specializing 
to weighting by polynomial functions, relates distributional families 
closed under independent addition, and in particular the infinitely di- 
visible distributions, to the family of transformations induced by their 
associated orthogonal polynomial systems. For these families, gen- 
eralizing a well known property of size biasing, sums of independent 
variables are transformed by replacing summands chosen according to 
a multivariate distribution on its index set by independent variables 
whose distributions are transformed by members of that same family. 
A variety of the transformations associated with the classical orthogo- 
nal polynomial systems have as fixed points the original distribution, 
or a member of the same family with different parameter. 



1 Introduction 

The zero bias transformation was introduced in |I3J. This mapping enjoys 
properties similar to those of the well known size biased transformation (see 
e.g. on non-negative variables, but can be applied to mean zero random 
variables. One main feature of the zero bias transformation is that its unique 
fixed point is the mean zero normal distribution, and for this reason it has 
been applied in Stein's method for the purpose of normal approximation 
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f[T3j. [TT| . [12], and [14j). The zero bias transformation is also related to the 
Ki function in the work [T£], |22j and jB] and others; for a good overview 
see 0. 

We place the classical size bias transformation and the zero bias trans- 
formation in a broader context, showing that both are a particular case of 
transforming a given distribution X into through the use of a mea- 

surable 'biasing' function P. To be more precise, for the given X and P 
let C m denote the collection of functions whose m th derivative exists and is 
measurable on R, and suppressing X on the left hand side, set 

F m (P) = {F g C m : E\P{X)F{X)\ < oo}. 

We consider transformations characterized by 

EP{X)F{X) = aEF {m \X iP) ) for all F G J rm (P), (1) 

where necessarily a = (m\)~ 1 EP(X)X m when X m e 2~{P)'> we insist a > 
0. We coin this distribution the X — P biased distribution. For discrete 
distributions the differential operator is replaced by the difference operator; 
see Sections 14.31 and l4~4l 

Theorem 12.11 provides our most general conditions on the existence of 
transformations characterized by (JIJ, where the 'biasing' function P is only 
required to have m sign changes and satisfy certain orthogonality and posi- 
tivity conditions; we call m the order of the resulting transformation. In the 
particular case where P is a polynomial, the sign change condition can be 
expressed in terms of the roots and order of the polynomial P, and the orthog- 
onality properties in terms of moments. For example, for each m = 0, 1, . . . 
there exists a distributional transformation which is defined using the Her- 
mite polynomial of order m as the biasing function, and whose domain are 
those distributions whose first 2m moments match those of the mean zero 
normal; the case m = 1 corresponds to the zero bias transformation of 

Theorem 12.11 in Section |21 shows distributional transformations exist in 
great generality. In Section El we find that there is considerable additional 
structure for the families of transformations induced by orthogonal poly- 
nomial systems, especially those corresponding to families of distributions 
which are closed under addition of independent variables. Corresponding 
to the Normal, Gamma, Poisson, Binomial and Beta-type distributions, in 
Section 13] we study the family of transformations defined using the Hermite, 



2 



Laguerre, Charlier, Krawtchouk, and Gegenbauer polynomials, and obtain 
high order Stein type characterizing equations. 

Our work here is in the spirit of jU], where other fundamental connections 
between Stein equations and orthogonal polynomials were first described. 
The approach in ;9j is iterated in [21] , combining it with well-known connec- 
tions between orthogonal polynomials and birth and death processes, and 
used as in [T7] to describe solutions of Stein equations. 

We first review some well known facts regarding the size bias transfor- 
mation, which is the simplest and best known of all these distributional 
transformations. For non-negative X with < EX — \i < oo, the X-size 
biased distribution X s is defined by the characterizing equation 

EXF(X) = fiEF(X s ) for all F e J=°(X). (2) 

One key feature of the sized bias transformation is the following. If X%, . . . , X n 
are independent non-negative variables with finite positive expectations EXi = 
fii and 

n 

w = £x, 

then a variable with the P^-size biased distribution can be constructed by 
replacing a variable Xj, chosen with probability proportional to //j, by an 
independent variable Xf having the Xj-size biased distribution. In other 
words, letting 

P(I = i) * 



be independent of X 1 , . . . , X n , the variable 

W s = W - X/ + Xf (3) 

has the P^-size biased distribution. Letting x + = max(0, x), size biasing is 
the case of with biasing function P(x) = x + . This transformation is of 
order zero, as there are m = sign changes of x + on R, and has a = EX + ; 
when X > we have X + = X resulting in the usual characterization (j2J). 

The zero bias transformation jT3j was motivated by the similarity between 
the size bias transformation and the Stein equation [21] for the mean zero 
normal distribution. In particular, Stein's identity says that Z ~ A/"(0, A) if 
and only if 

EZF(Z) = XEF'(Z) for all F E T l {Z). (4) 
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Comparing (J1J) to (J2J), for a mean zero, positive variance A variable X, we 
say that X z has the X-zero biased distribution if 



EXF(X) = \EF\X Z ) for all F E T 



(X). 



(5) 



Note that (jHJ) for zero biasing is the same as (J2J) for size biasing, but with 
variance replacing mean, and F' replacing F. That the normal distribution 
with variance A is the unique fixed point of the zero bias transformation 
follows immediately from the characterization (jlj). It was shown in J3| that 
the zero bias distribution X z exists for all X that have mean zero and finite 
positive variance. Its existence follows also from Theorem 12.11 as the special 
case of (0) for the function P(x) = x, having m = 1 sign changes on R, and 
a equal to the variance A of X. 

The zero bias transformation was introduced and used in to obtain 
bounds of order n~ l in normal approximations for smooth test functions 
under third order moment conditions, in the presence of dependence induced 
by simple random sampling. In ^T] it is used to provide bounds to the normal 
distribution for hierarchical sequences generated by the iteration of a so called 
averaging function, in for normal approximation in combinatorial central 
limit theorems with random permutations having distribution constant over 
cycle type, and in ^3] the extension of the zero bias transformation to higher 
dimension is considered. 

The zero bias transformation enjoys a property similar to Q for size bias- 
ing. In particular, it was shown in jT^j that a sum of independent mean zero 
variables with finite variances can be zero biased by replacing one variable 
chosen with probability proportional to its variance by an independent vari- 
able from that summands zero biased distribution. Precisely, let Xx, . . . ,X n 
be independent mean zero variables with variance Aj = EXf > 0, 



and I a random index, independent of Xx, . . . , X n with distribution 



W = Xx + • • • + X, 



P(I = i) 




(6) 



Then 



W* = W - Xj + Xf 



(7) 
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has the W-zero biased distribution, where X* is a variable independent of 
Xj,j 7^ i having the Xj zero biased distribution. This construction is ex- 
tended to the families of transformations associated with orthogonal polyno- 
mial in Theorem 13.11 In particular, in Section El we see that for higher order 
transformations sums of independent variables are transformed by replacing 
multiple variables chosen according to some distribution (e.g. multinomial, 
multivariate hypergeometric) with independent variables possessing distribu- 
tions transformed by the same family. 

In Section El we give the moment and sign change conditions on P which 
guarantee the existence of the X — P distribution and provide an explicit 
construction. In Section |H] we treat the special case where P is a member 
of a family of orthogonal polynomials. The generalization to higher order 
of the 'replace one variable' zero and size bias constructions is based on the 
identity (}2"5"|) expressing an orthogonal polynomial of a sum as a sum of like 
polynomials with summands having no larger order, and is given in Section 
El In Sections 14.11 14.21 14.31 14.41 and 14.51 we treat the Hermite, Laguerre, 
Charlier, Krawtchouk and Gegenbauer polynomials, corresponding to the 
Normal, Gamma, Poisson, Binomial and Beta-type distributions respectively. 
Special instances of the Beta-type distributions we consider are the uniform 
U\— 1, 1], the arcsine, and the semi-circle distribution. 

2 Transformations in General 

We begin our study with the following existence and uniqueness theorem 
for the types of distributional transformations under consideration. We say 
the measurable function P on R is positive on an interval I if P(x) > 
for all x E / with strict inequality for at least one x, and similarly for 
P negative on /. We say P has exactly m = 0, 1, . . . sign changes if R 
can be partitioned into m + 1 disjoint subintervals with non-empty interior 
such that P alternates sign on successive intervals. Though the choices for 
the endpoints of such intervals may be somewhat arbitrary when there are 
intervals where P is zero, we will nevertheless say that a sign change occurs 
at the interval boundaries; the uniqueness guaranteed by Theorem 12 . 1 1 shows 
that the X — P biased distribution constructed in the proof of Theorem 12.11 
is the same for all interval boundary choices, and Example 12.11 gives some 
additional explanation of this phenomenon in the context of a particular 
example. We note that for existence in general, regarding boundedness, 
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the orthogonality conditions required by Theorem 12.11 are only relative to 
P and required only up to a finite order; such conditions may not impose 
boundedness on any of the power moments of X, as illustrated in Example 

o 

Theorem 2.1 Let X be a random variable, m G {0, 1, 2, . . .} and P a mea- 
surable function with exactly m sign changes, positive on its rightmost inter- 
val and 

— EX k P(X) = a5 Km h = 0,...,m, (8) 
ml 

with a > 0. Then there exists a unique distribution for a random variable 
X^ such that 

EP(X)F(X) = a£F (m) (X (p) ) for all F G T m {P). (9) 

Theorem 12 . 1 1 say s that X is in the domain of the distributional transformation 
of order m defined using the 'biasing' function P having m sign changes when 
the powers of X smaller than m are orthogonal to P(X) in the L 2 (X) sense, 
that is, when P{X) G {1, X, . . . , X™" 1 } 1 , and EX m P(X) > 0. As noted 
above, the existence of both the size and zero bias transformations are both 
special cases. 

Proof of Theorem 12. 1L We give an explicit construction of the variate 
X( p \ By replacing P by P/a, it suffices to prove the theorem for a = 1. 
Label the points where the m sign changes of P occur as r%, . . . , r m , and let 

m 

Q(x) = H(x-r t ), (10) 

i=l 

adopting the usual convention that an empty product is 1. By construction 
Q and P have the same sign, so letting fix denote the distribution of X, 

dMv) = —.Q{y)P{yWx{y) (n) 

ml 

is therefore a measure, and since (jHJ) with k = m implies that EQ(X)P(X) = 
ml, a probability measure. Now with Y and {C/j}j>i mutually independent 
with Y having distribution //y and Uj having distribution function u l on 
[0, 1], with ro = Y and r m+ \ = 0, we claim that 

m+l / m \ 

x {p) = e n^u r *- i - r * ) (12) 

k=l \i=k ) 
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satisfies Q, thus proving the existence of the X — P biased distribution. 
We begin by noting that for any F for which either side below exists, 

EF(Y) = —EF{X)Q{X)P{X) 1 (13) 
ml 

and so for k — 0, . . . , m, letting 

TO 

R k{x) = n {x-n), 

i=k+l 

a polynomial of degree m — k, by (JT3J) and (jSJ) we have 

k i 

E(l/ f[(Y- n)) = —ER k (X)P(X) = 5 m . Km . (14) 

, ml 

i=i 

We show the claim by induction. In particular, for k > 1 letting 

m m+1 

V k = Y[ U t , W k =Y. ^ifa-i - rj), (15) 

i=k j=k 

and taking X^ p ' as in ()12|) . we show that for all F G C£°, the collection of 
infinitely differentiable functions with compact support, and k = 0, . . . , m 

We see the expectation on the right hand exists since F and all its derivatives 
are bounded, V k+ \ is independent of Y for all k, EU^ k < oo for % > k + 1, 
and use of (JHJ). 

The case = is the statement that X^ = V\(Y— ri)+W / 2, which follows 
from definitions ()12|) and Assume ()16|) holds for some < A; < m. Using 
Vk+i = U k+ \V k+ 2 in (frBj) and taking expectation over U k+ i, with density 
(A; + l)u k+1 , we obtain 

(i , ^r, f 1 \F {m - k) {u k+1 V k+2 {Y -r k+1 ) + W k+2 )\ k , 

i ul +1 v k k +2 ut=i(Y - n) J 

Cancelling and integrating, we obtain 



n, i 1 ,, F [ i ?(m - (fc+1) HWy-r fc+ i) + ^ 



(17) 



Using the independence of V k+2 and Y for any k, and that Wk+2 is inde- 
pendent of Y for all k > 0, the second term in the expectation (|17|) vanishes 
by (JHJ), since k+1 > 1. The induction is completed by noting that definitions 
flfej) give that V k+2 (Y - r k+1 ) + W k+2 = V k+2 (Y - r k+2 ) + W k+3 . 

Now applying (fTBj) fork = m and using V m +\ = 1, W m+2 = and r m+ i = 
we obtain 

= m!£ |^tS} = EP(X)F{X) 

by (Unj). That is, the equality in © holds for all F G Cf. 

For F G jF m (X), by replacing F by 

j=o J- 

if necessary, we may assume, in light of (jHJ), that F^\0) = for j = 
0, . . . , m — 1, and hence, 

with If= f, F(x) = I m f for some measurable function /. 
Jo 

Since F = Fi — F 2 where Fi(x) = I m f + and F 2 (x) = I m f~, it suffices by 
linearity to consider / > 0. Letting < f n | / we have / m /„ = in t ^, 
and hence the equality in (jHJ) holds for F G JF m (X) using the monotone and 
dominated convergence theorems on the right and left sides of ([§]). respec- 
tively 

The distribution X^ p ^ is unique since (Q holds for all F G C^°, which is 
separating. ■ 

The existence of the distribution also follows from the Riesz repre- 
sentation theorem upon demonstrating the positivity of the linear operator 
T defined by 

Tf = EP{X)F{X) with F(x) = I m f 

over / G C°, the space of continuous functions with compact support. The 
signed measure dfj, = Pdfi x has the property / x^d^i = EX^P(X) = for 
j = 0,1, . . . ,m — 1, and now the sign change property of P allows us, when 
on the finite interval [a, b], to invoke Theorem 5.4 in Chapter XI of 19, 
(see also Example 1.4 in Chapter XI) to conclude T is positive and hence 
Tf = f% fd/j,^ for some measure which is a probability measure since 
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EX m P(X) = m\. This argument is similar to the one used in to prove 
the existence of the zero bias distribution for a mean zero, finite variance X 
by noting that when / > the function F = If is non-decreasing, and hence 
X and F(X) are positively correlated, and so the operator 

Tf = EXF(X) > EXEF(X) = 

is positive. 

Example 2.1 Consider the application of Theorem \2. 1\ where P(x) has ex- 
actly m = 1 sign change at r\ = 0. Then for the non constant X to be in the 
domain of the transformation characterized by 

EP{X)F{X) = aEF\X (p) ) (18) 

we require EP(X) = and a = EXP(X) > 0. We have Q(x) = x in / TTTTj) 
and, recalling the X variable in the proof was rescaled to have a = 1, the Y 
distribution in < f77|) is 

dfiyiy) = xP(x)d/ix(y) I 'ol- 

From Mty) with m = 1, ro = Y, T2 = and Uj with distribution function u 1 
on [0, 1] 

m+l / m \ 

X(P) = E II Ui) ^-i - r k ) = Ut{ro - n) + (n - r 2 ) = U X Y. 

k=l \i=k ) 

Hence X^ is absolutely continuous, and one can directly verify that its den- 
sity is given by 

f( p \ x ) = a' l E[P{X)- X>x\. (19) 

When Jq P{u)du is finite for all x and c = f exp(— a^ 1 Jq P{u)du)dx < oo, 
the transformation has a fixed point at the distribution with density 

f(x) = c _1 exp ^— — J P(u)du^j ; 

for instance, when P(x) = x, f is the mean zero normal density with variance 
a. 
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Taking P to be the sign function 



P(x) = l(x > 0) - l(x < 0) 

provides an example of a transformation given by a discontinuous P, and 
shows that generally the orthogonality conditions may not reduce to restric- 
tions on the moments of X , in particular, (Q|) for k = requires X to have 
median 0. If in addition a = E\X\ is finite, imposed by (QJ) for k = 1, The- 
orem \2.1\ gives that X is in domain of the transformation characterized by 
A18\) . The density of the transformed variables are, by H19\) . 

f(P)( T )-S P{X>x)/E\X\ x>0 

J K x ) | p(x < X )/ E \X\ x<0. { ] 

For this choice of P the Y distribution in ill]) becomes 

d^y(y) = \y\dfi x (y)/E\X\, 

which is the \X\ size biased distribution. Hence, theX — P biased distribution 
is obtain by multiplying Y ~ /iy by an independent U[0, 1] variable. The 
transformation has a fixed point at the Laplace distribution with density 

Taking P(x) = l(x > 1) — l(x < —1) gives a transformation having 
domain those variables X with a = E(\X\1(\X\ > 1)) < oo and satisfying 

P{X > 1) = P(X < -1). (21) 

Since P(x) = in the set [—1, 1] the sign change can be said to occur at point 
in (—1, 1) and the polynomial Q in the proof of Theorem \2. 1\ can be taken to 
be 

Q(x) = x — ri for any r\ G (—1, 1). 

As assured by uniqueness, the distribution constructed in the proof of The- 
orem Wl\ does not depend on choice of ' r%; in fact, in this case \21)) implies 
that the dmuy distribution in Ml)) is the same for all r x G (—1, 1). 
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3 Transformations using orthogonal polyno- 
mials 

We consider a system of polynominals orthogonal with respect to a non-trivial 
family of distributions Z x ~ C x indexed by a real parameter A. 

Condition 3.1 For some m > 0, the polynomials {P x (x)}o<fc<m are monic, 
have degree k, are orthogonal with respect to the distributional family Z x ~ 
C x , and satisfy E[P k (Z x )} 2 > 0. 

Note that since P x is monic and orthogonal it has k distinct roots and is 
positive as x — > oo (e.g. pQ); furthermore, we have 

EZ k P k (Z x ) = E[P k {Z x )}\ k — 0,...,m. 

When studying transformations using an implicit family of orthogonal poly- 

(k) 

nomials, we index the transformed distribution by say, X x , that is, by the 
parameter A and order k of the polynomial. 

Applying Theorem 12.11 in this framework, we obtain the following 

Corollary 3.1 Let Condition \3.1\ be satisfied with EZl" 1 < oo, and for < 

k < m set 

a{ k) = ^EZ k x P k x {Z x ). (22) 

Then for all X G M k x , where 

M k x = {X: EX 1 = EZl < j < 2k}, 
there exists a random variable X x k ' such that for all F G J-~ k (P k ) 

EP k (X)F(X) = a x k) EF^ k \X {k) ). (23) 
Proof: By Condition 13.11 and orthogonality we have for < j < k < m, 

^EX>P k (X) = yEZ{P k (Z x ) = yEPi(Z x )P k (Z x ) = a[ k) 5 hk 

using X G M- k x - Now invoke Theorem 12.11 ■ 
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We say the family of distributions Z\ is closed under independent addi- 
tion if for independent Z\. ~ C\., i = 1, 2 we have Z Xl +Z\ 2 ~ £\ 1+ \ 2 . There 
is special structure when the transformation function in Theorem 12.11 is a 
member of an orthogonal polynomial system corresponding to such a family. 
In particular, the following Theorem 13.11 generalizes (J3J) and (JJJ) in showing 
how a sum of independent variables can be P™ transformed by replacing a 
randomly chosen collection in the sum by variables with distributions trans- 
formed using the same orthogonal polynomial system. 

For n = 1,2,..., consider a multi-index m = (mi, . . . ,m n ), and with 
A = (Ai, . . . , A n ) and x = . . . , x n ) let 

n n 

m=|m|=^m;, \ = ^\ i 



n 

i=l i=l 



and set 



n 

Q,( m ) _ TT ^,( mi ) 



i=l i=l 



Theorem 3.1 Let Z\,\ > be a family of random variables closed under 
independent addition with EZ^ 171 < oo, and suppose the associated orthogonal 
polynomials {P* (x)}o<fc<m satisfies Condition ^. II and, for some weights c m , 
the identity 

P?M = £ c m P£(x), (25) 



where Pj^(x) is given in \24\j and w = X\ + • • • + x n . Then and 
defined in and p^l ) respectively, satisfy 

«i m) = E c m a£\ (26) 

m:|m|=m 

and we may consider the variable I, independent of all other variables, with 
distribution 

(m) 

P(I = m) = c m -^y, |m| = m. (27) 
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Furthermore, for any positive Ai, . . . , A„ and independent variables X\, ■ • ■ , X r 
with 

n 

XiEM^ and W = J2 x h 

i=l 

the variable 

wi m) = E Wi? 

m:|m|=m 

has the W — P™ distribution. 

Proof. Since Xi G Ai™, we have for < k < 2m, and independent Z\ i ~ C\ i 
and Z\ ~ £>, that 

n n 
i=l i=l 

Hence W G Ai m , and the distribution exists by Corollary 13.11 Equality 
follows by multiplying by M/" 1 = (J2i Xi) m taking expectation, and 
using independence and orthogonality. 
By for any F G C c °°, 

af^F^W? ) = jB ^ Cma ^)F( m )(H/i m) ). (28) 

m 

Using independence and successively applying the identity 

afW^)^ + y) = FP^pQF^pQ + y), (29) 
we see that the right hand side of (J25j) is equal to 

^E c m^A(X)F(^) = EP?(W)F(W), (30) 

m 

by (J2SJ). Comparing ((2HJ) to (jHOJ) we have 

a H £J rM(^H) = EP A m (H/)F(H/), 
for all F G C£°, and hence W^ 71 has the W-P™ biased distribution. ■ 



13 



For the possibly infinite system of monic polynomials {P™(x)} orthogonal 
with respect to C\, define the generating function 

-im 

M x '*) = E P \( x )—y (31) 

m>0 " L - 

Though the constants can be found using F(x) = x m in (|23jl . squaring 
({HI} and taking expectation using orthogonality gives the alternative method 

j.2m 

E[UZ,A)?=Y.*r-T- (32) 

m>0 m - 

Theorem 13 . 21 applies in the special cases considered in Sections 14. ll through 

IPl 

Theorem 3.2 If the polynomial generating function <fit{x, A) in satisfies 

n 

<f>t(w, A) = J[ 4> t (xi, A») (33) 

i=l 

/or w = xi + • • • + x n and A = Ai + • • • + \ n , then and hence in 

Theorem VJ . 1\ are satisfied respectively by 

/ \ / \ ( m ) 

( m\ ( m\ a \ 

Cm = and P(I = ml = , . m = m. 



m / \ m , 



Proof: Rewriting (|3*3*|) 



j-m n j.rrn 

m>0 i=l mi>0 1 



E 



miH hm„ 



^ v ' To]! • • • m n \ 



m >0 m - m=m \ m 



giving (J25)) with the values claimed. ■ 
We also note that squaring (J2*5|) and taking expectation, using indepen- 
dence and orthogonality, results in 



^ = E □ (34) 



[m|=m 
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so that the conclusion of Theorem 13.21 can also be seen to hold by equating 
coefficients of ()26jl and (|3*4*|) when takes on sufficiently many values. 

We end this section with a result about the potential for iterated biasing. 

Theorem 3.3 Let Condition \3. II be satisfied, and suppose that the the dis- 
tributional family at Z\ is closed under transformation with respect to P k (x), 
that is, there exists /z(A, k) such that 

Then if X e M.™ we have X x £ M-™n, \\ for k < m. In particular for 
non-negative j with < k + j < m, the distribution (X^ )jjh k -\ exists. 

Proof. Let < j < 2{m — k) and F(x) = x k+ i/(k + j)k, where (x)k = 
x(x — 1) • • - (x — k + 1). Then 

a[ k) E(x{ k) ) j = a[ k) EF {k \X (k) ) = EP k (X)F(X) 
= EP k (Z x )F(Z x ) = a[ k) EF^((Z x )[ k) ) = a{ k) E(Z, im y. 

Thus the first 2(m — k) moments of X^ match those of Z^x,k), and the 
existence of the distribution (X^y^X h s follows from Corollary 13.11 ■ 

4 Special Orthogonal Polynomial Systems 

In Sections 14. II - l4~5l we specialize to the classic Hermite, Laguerre, Charlier, 
Krawtchouk and Gegenbauer orthogonal polynomial systems, corresponding 
to the Normal, Gamma, Poisson, Binomial and a Beta like family, respec- 
tively. All these families correspond to a collection of orthogonal polynomials 
satisfying Condition 13. II and except for the last case, have a generating func- 
tion which satisfies (pI3j) . The Normal and Poisson distributions are fixed 
points of their associated transformations. In the Gamma, Binomial and 
Beta-type cases the transformations map to the same family, but with a 
shifted parameter. For further connections between probability distributions 
and such polynomial system generating functions, see |2j and [3]. 
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4.1 Hermite Polynomials 

For a 2 = A > 0, define the collection of Hermite polynomials {H^(x)} m >o 
through the generating function 



oo ±r 



m=0 m - 

or equivalently, the Rodriguez formula 

2 ff" 2 

flT(x) = (-Are«— e-fc. (36) 

These polynomials are orthogonal with respect to the normal distribution 
W(0,A) with density (2ttA)~ 1/2 exp(-x 2 /(2A)). 

For F G and Z\ ~ A/"(0, A), applying the Rodriguez formula (f3H|) we 
have 

OO 



A m / F (m) (x)^=dx 



Hence, 



-oo a/ X2tt 

X m EF^ m \Z x ). (37) 

(Z^ (m) 



that is, for each m = 0, 1, ... , the normal Z\ ~ jV(0, A) is a fixed point of 
the m th order transformation induced by H™(x). 

From ()37jl we see that = A m , which we could find alternatively using 
and 



j. 2m 
2 x , iii C 



£[e z ^ A < f = e xt = ^A 



m>0 



Now since the generating function (f33J) satisfies the conditions of Theorem 
13 .2\ the distribution of the random index / in Theorem 13.11 is multinomial 
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Mult (m, A). For zero biasing and m = 1, this multinomial distribution re- 
duces to the 'pick an index proportional to variance' as specified in (0). 

Lastly, we indicate two ways in which the classical Stein equation can be 
generalized to the Hermite case. With Mh = Eh(Z), the standard normal 
expectation of h, both the equations 

f{x)H?-\x) - H™{x)f(x) = h(x) - Nh (38) 

and 

f {m \x) - H™(x)f(x) = h{x)-Nh (39) 

reduce to the usual Stein equation when m = 1 (see [21], [22]) 

f\x) - xf{x) = h{x) - ATh, (40) 

and in particular the expectations on the left hand sides of each evaluated 
at a random variable W are zero for all / G if and only if W is standard 
normal. 



4.2 Laguerre Polynomials 

For A > 0, let {L™(x)} m > be the collection of Laguerre polynomials defined 
by the generating function 

(l + t)- A exp{— - }=£^(*fe (41) 

or equivalently, the Rodriguez formula 

I™(x) = {-l) m x- x+1 e x ^x x+m - 1 e- x , (42) 

which are orthogonal with respect to the Gamma distribution with parameter 
A, having density x x ~ 1 e~ x /T(X), x > 0. 

For F G and Z\ with this density, applying the Rodriguez formula 
(021) yields 
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'-i) m r° / d 



r(A) Jo 

T(A + m) /■«> , m) x x + m ~ x e-* 
= rf\\ / F (x) , dx 

= (X) m EF^(Z x+m ), (43) 
where (A) m is the rising factorial, 

, T(A + m) 



(A) m = A(A+l)---(A + m 



T(A) 
Hence 

From (}4*3j) we see that ot^ = (A) m , which we could find alternatively 
using (|3*2j) and 



E 



(7 + \ 1 2 f 2m 

^7) =(i-t 2 r A = E(Ar 



m>0 m! 



Since the generating function (J41j) satisfies the conditions of Theorem 13.21 
the random index / in Theorem 13.11 has distribution 



P(I = m) 



which we recognize as the multivariate hypergeometric distribution with pa- 
rameters m and Ai + rti\ — 1, . . . , A n + m n — 1, see |T%] . p. 301. 

Though the Gamma is not a fixed point of the Laguerre transformations 
as the normal is for the Hermites, nevertheless there exist Stein equations for 
the Gamma paralleling (|40|) for the normal which can be used for studying 
distributional approximations for the Gamma family; for details, see j20j. In 
particular, we have the Stein characterization that X ~ T(A, 1) if and only if 

E(X - X)f(X) = EXf(X) 

for all smoooth functions /. Using that L\{x) = x — A, the X^ 1 ' order one 
Laguerre transformation is characterized by 



E(X - X)f(X) = XEf(X 



ay 
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for all smooth functions /. Comparing these two equations we see that 
X ~ T(A, 1) if and only if for all smooth functions /, 

EXf'(X) = \Ef'(XV); 

in other words, X ~ T(A, 1) if and only if X*- 1 -*, the first order Laguerre 
transformation of X, equals its size bias transformation X s . 



4.3 Charlier Polynomials 

For A > 0, let {C™(x)} m > be the collection of Charlier polynomials defined 
by the generating function 

oo j.m 

e~ xt (l + tf= ECM-j, (44) 

m=0 m - 

or, equivalently, with (x) fc = x(x — 1) • • • (x — k + 1), the falling factorial, 

C?(x) = £(^yx) k (-\) m - k , (45) 

giving a family orthogonal with respect to the Poisson distribution V(X) with 
mass function e~ x X k /k\, k = 0, 1, . . .. From (|45|) one can derive the Rodriguez 
formula 

C?(x) = (-l) m T(x + l)A m -* V m (p^y) , (46) 

where V/(x) = /(x) — /(x — 1), the backward difference. 

Since the transformations in Theorem 12.11 defined using derivatives of 
test functions yield absolutely continuous distributions when m > 1, no 
discrete distribution will be a fixed point. However, parallel to (jUJ), for an 
integer valued random variable X we can define the discrete X — P biased 
distribution via 

EP{X)F{X) = aEA m F{X im) ) for aU F e T&{P), (47) 
where Af(x) = f(x + 1) — f(x), and again suppressing dependence on X, 
jr A (p) = {F : R ^ R : E\P(X)F(X)\ < oo}. 
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That for all m = 0, 1 . . . the Poisson "P(A)-distribution is a fixed point 



(^A)i m) = Z X 

of the discrete transformation (}4"T)) with P replaced by can be seen as 
follows. For Z\ ~ V(X), by the Rodriguez formula (J46|) and 

oo oo 

E V m b k ■ a k = (-l) m E h& m a k , (48) 

k=0 k=0 



we have 



oo \ A; 



oo / \k\ 

fc=0 V / 

oo — A\fc 

^ m E -TT- AmF ^) 

\ m EA rn F(Z\). (49) 



From (j4*9*j) we see that a^ m ' ) = A m , which we could find alternatively using 
and 



E[e- Xt (1 + tf*] 2 = e xt2 = E A m ^y. 



2m 



m>0 m[ 



Using the existence of the Charlier biased distributions and that (J29|) 
holds with derivative replaced by difference, it is easy to see that the ar- 
gument and hence conclusion of Theorem 13.11 holds in this discrete case. 
Now since the generating function (J4*4")) satisfies the conditions of Theorem 
13.21 the distribution of the random index I in Theorem 13.11 is multinomial 
Mult(m, A), as in the normal case. 

As the order one Charlier polynomial is C\(x) = x — A, Stein charac- 
terizations of the form (pl8*j) or (pl9*|) . with Hermite replaced by Charlier and 
derivatives replaced by differences, generalize the Stein equation for the Pois- 
son distribution with parameter A given in and extensively studied for 
example in [Ij. 
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4.4 Krawtchouk Polynomials 

With A = 1, 2, . . . and p E (0, 1) fixed, let {K™(x)}o< m <\ be the collection of 
Krawtchouk polynomials defined by the generating function 

A j.m 

(1 + qtf (1 - pt) X ~ x = —yKxW (5°) 

m=0 m - 

where p + q = 1, giving the family of polynomials orthogonal with respect to 
the Binomial B(X,p) distribution. In contrast to the previous examples, the 
Binomial is not infinitely divisible and has support on a bounded set. 

Following the approach set out in jSJ, the polynomials can also be given 
by the Rodriguez formula 



gM J- ir y^ v-{f;")g) 

and so for Z\ ~ B(X,p), < m < A and bounded F, 
EK™(Z X )F(Z X ) 



fe=0 



A' 



mi 



3^(-i)-En*)v-{( A - m ) g 

Using (|IHjl and letting (A) m again be the falling factorial, we write the last 
expression as 



k=0 \ 
M -rp \m 771/ 7 (m)x 



a x m, EA m F(Z 



a ;> 
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yielding 

«i m) = (X) m (pq) m and {Z^ = Z x _ m . 

Hence, similar to the Gamma family, the Binomial distribution is not a fixed 
point of its own transformational family, but the transformed distribution 
is a member of the same family. One can calculate a™ alternatively using 
(I32*J) . (J5UJ), and series expansion of 

E(l + qt) 2Zx (l -pt) 2X ~ 2Zx = (l+pqt 2 ) x . 

As for Example 14.31 the conclusion of Theorem 13.11 holds, and since the 
generating function (|5U|) satisfies the conditions of Theorem 13 . 21 the distribu- 
tion of the random index / in Theorem 13.11 is given by 

= Q (Mg^ n^(A,) m , ( m \ nr = i(Aj) m< rgu (£) 

1 J " (A) m (pg) m " W " (*) ' 

which we recognize as the multivariate hypergeometric distribution with pa- 
rameters m and A 1; . . . , A n , see jTHj, p. 301. 

From ^U] we have the Stein characterization that X ~ B(X,p) if and only 

if 

P E(X-X)f(X + l) = qEXf(X), 

for all functions / for which these expectations exist. Using the first Krawtchouk 
polynomial is K\(x) = qx-p(X-x), we obtain that the first order Krawtchouk 
transformation is characterized by 

qEXf(X) -pE(X-X)f(X) = XpqEAf(X^). 

Combining these equations yields that X ~ B(X,p) if and only if 

pE(X-X)Af(X) = pE(X-X)(f(X + l)-f(X)) 
= qEXf(X)-pE(X-X)f(X) 
= XpqEAf(XW). 

Putting g(x) = A/(A — x) we see that X ~ B(X,p) if and only if A — X^ 
has the (A — X)-size biased distribution, that is, if and only if 

A - X (1) ~ B(X — l,q) + 1, which is equivalent to X (1) ~ B(X - l,p). 
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4.5 Gegenbauer Polynomials 

In this last section, we consider a polynomial system orthogonal with respect 
to a continuous distribution with compact support. For A > — |, let (see [3]) 



(m) 



r(A)r(2A + m)r(A + 1) 
2 2m r(A + m + l)r(A + m)r(2A) 



and the collection of Gegenbauer polynomials G\ n (x) be defined via the Ro- 
driguez formula 

Then G™(x) are monic, have degree m, and satisfy the orthogonality relation 

±jEG k x (Z x )G?(Z x ) = a[ m) 5 Km k = 0, . . . ,m, 
where Z\ ~ g\ with 

^(x) = ^^^(i-x 2 )M, |x|<i. 

v/7rr(A + 5 ) 

In particular Corollary 13.11 obtains, proving the existence of the family of 
Gegenbauer transformations. This family of distribution is a special case of 
the centered Pearson Type I-distributions, sometimes also called Beta Type 
I-distributions, see (221, P-150. We note that for A = 1/2 we obtain the 
uniform distribution U[—l, 1], for A = the arcsine law, and for A = 1 the 
semi-circle law 

Considering the action of the G™ transformation on Z\ ~ g\, for F e C, 
we have 



00 

c 



EG™(Z X )F(Z 



x) 



7?r(A + i) v ' A r(A + l)r(A + m + |) J-i 'dx 

yf-K T(A + m + |J 7-1 
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yielding 

(^A)i m) = Z\ +m . 

Thus, for A = we obtain that the first order Gegenbauer transformation of 
the arcsine distribution is the semi-circle law. 

Lastly we note that since the above Beta-type distributions are not invari- 
ant under addition, Theorem 13 . 1 1 and its construction do not apply. However, 
as G\(x) = x, we recognize the first order Gegenbauer transformation as the 
zero-bias transformation, so that for sums of independent random variables 
the construction given in (JJJ) applies. 
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